03 - R framework with IMPACT - session 3

Author

Yann Say

Published

April 30, 2024

library(addindicators)
library(dplyr)

my_data <- addindicators::addindicators_MSNA_template_data

Composition - adding indicators

The framework is built around 4 steps: cleaning, composition, analysis, outputs

  • Cleaning: any manipulation to go from the raw data to the clean data
  • Composition: any manipulation before the analysis e.g. adding indicators, adding information from loop or main, aok aggregation, etc.
  • Analysis: any manipulation regarding only the analysis
  • Outputs: any manipulation to format the outputs. Outputs are created from the results table, from the stat + analysis key

The following section will present some introduction about the composition, in particular how to add indicators and review them.

add_*

add_* functions will add a variable (column) to the dataset. For example, to add the duration of a survey, to add the food consumption score category, etc.

add_* function takes a dataset as input and returns the dataset + the new indicator (and any intermediate steps used for the calculation).

For example, to check the duration of a survey, there is only the start and end, but not the duration column.

With addindicators some intermediate columns can be added if they are used to create the new indicator.

add_fcs

my_data_with_fcs <- my_data %>% add_fcs(
  cutoffs = "normal",
  fsl_fcs_cereal = "fs_fcs_cereals_grains_roots_tubers",
  fsl_fcs_legumes = "fs_fcs_beans_nuts",
  fsl_fcs_veg = "fs_fcs_vegetables_leaves",
  fsl_fcs_fruit = "fs_fcs_fruit",
  fsl_fcs_meat = "fs_fcs_meat_fish_eggs",
  fsl_fcs_dairy = "fs_fcs_dairy",
  fsl_fcs_sugar = "fs_fcs_sugar",
  fsl_fcs_oil = "fs_fcs_oil_fat_butter"
)

my_data_with_fcs[, tail(names(my_data_with_fcs), 10)] %>%
  head()
fcs_weight_cereal1 fcs_weight_legume2 fcs_weight_dairy3 fcs_weight_meat4 fcs_weight_veg5 fcs_weight_fruit6 fcs_weight_oil7 fcs_weight_sugar8 fsl_fcs_score fsl_fcs_cat
0 0 8 28 5 6 1.0 0.5 48.5 Acceptable
4 6 12 16 7 2 2.5 1.0 50.5 Acceptable
8 0 0 24 7 6 1.0 0.5 46.5 Acceptable
14 3 4 12 4 2 1.0 2.5 42.5 Acceptable
6 6 4 20 6 7 3.5 3.0 55.5 Acceptable
0 6 16 4 7 1 0.0 3.5 37.5 Acceptable
Note

You can learn more about food security indicators here.

add_hhs

Pipe-able

The framework is built around 2 adjectives, pipe-able and independent. In the framework, functions of the same family should be pipe-able. In the following case, 2 add_* functions are piped.

my_data_with_indicators <- my_data %>%
  add_fcs(
  cutoffs = "normal",
  fsl_fcs_cereal = "fs_fcs_cereals_grains_roots_tubers",
  fsl_fcs_legumes = "fs_fcs_beans_nuts",
  fsl_fcs_veg = "fs_fcs_vegetables_leaves",
  fsl_fcs_fruit = "fs_fcs_fruit",
  fsl_fcs_meat = "fs_fcs_meat_fish_eggs",
  fsl_fcs_dairy = "fs_fcs_dairy",
  fsl_fcs_sugar = "fs_fcs_sugar",
  fsl_fcs_oil = "fs_fcs_oil_fat_butter"
  ) %>%
  add_hhs(
  )

my_data_with_indicators[, tail(names(my_data_with_indicators), 14)] %>%
  head()
fsl_fcs_score fsl_fcs_cat fs_hhs_nofood_yn_recoded fs_hhs_nofood_freq_recoded fs_hhs_sleephungry_yn_recoded fs_hhs_sleephungry_freq_recoded fs_hhs_daynoteating_yn_recoded fs_hhs_daynoteating_freq_recoded hhs_comp1 hhs_comp2 hhs_comp3 hhs_score hhs_cat_ipc hhs_cat
48.5 Acceptable 0 0 0 0 0 0 0 0 0 0 None No or Little
50.5 Acceptable 1 1 0 0 0 1 1 0 0 1 Little No or Little
46.5 Acceptable 0 0 1 2 1 0 0 2 0 2 Moderate Moderate
42.5 Acceptable 0 0 0 1 1 0 0 0 0 0 None No or Little
55.5 Acceptable 0 0 0 1 1 0 0 0 0 0 None No or Little
37.5 Acceptable 1 1 0 2 1 2 1 0 2 3 Moderate Moderate

Compsition - reviewing indicators

Reviewing indicators will compare 2 indicators together and present the differences. It will not check how the indicator was created nor check for inconsistencies. That mean, to review an indicator, it is necessary to create one and compare them. The functions review_one_variable and review_variables will focus on the latter.

review_variables

First, a new dataset can be created for the review.

review_df <- addindicators_MSNA_template_data %>%
  add_fcs(
  cutoffs = "normal",
  fsl_fcs_cereal = "fs_fcs_cereals_grains_roots_tubers",
  fsl_fcs_legumes = "fs_fcs_beans_nuts",
  fsl_fcs_veg = "fs_fcs_vegetables_leaves",
  fsl_fcs_fruit = "fs_fcs_fruit",
  fsl_fcs_meat = "fs_fcs_meat_fish_eggs",
  fsl_fcs_dairy = "fs_fcs_dairy",
  fsl_fcs_sugar = "fs_fcs_sugar",
  fsl_fcs_oil = "fs_fcs_oil_fat_butter"
  ) %>%
  select(uuid, fsl_fcs_score, fsl_fcs_cat)

Then the dataset to be reviewed and the new dataset can be binded together.

binded_df <- my_data_with_indicators %>%
  full_join(review_df, by = "uuid")
Note

I would advice to use a full_join rather than a left/right_join. That way if any computation has missing value they will be spotted.

Note

With the join_* if the names are the same .x and .y will added to the names.

review_*

review_* functions will review an object by comparing it to standards or another object and flags differences, e.g. reviewing the cleaning by comparing the raw dataset, the clean dataset and the cleaning log, analysis comparing it with another analysis.

review_one_var <- review_variables(binded_df,
  columns_to_review = "fsl_fcs_cat.x",
  columns_to_compare_with = "fsl_fcs_cat.y")


review_one_var %>% 
  names()
[1] "dataset"      "review_table"

It is a list with the dataset and a review table.

review_one_var$review_table %>% 
  head()
uuid variable review_check review_comment
eaf540cd-32bd-41474b-b4beb5-d62fc987e45a fsl_fcs_cat.x TRUE Same results
89e706c3-53d8-4a4049-898586-4926085db71e fsl_fcs_cat.x TRUE Same results
afd921c6-e54a-4c4740-919c93-87f59bd0e63a fsl_fcs_cat.x TRUE Same results
d8b05f39-ba85-494c4d-808c84-9dc57823a4f1 fsl_fcs_cat.x TRUE Same results
d6b42f9e-c209-4c4541-808a81-86bea53df142 fsl_fcs_cat.x TRUE Same results
f1b9ec67-20db-47404d-a3ada0-1a37e5c49d02 fsl_fcs_cat.x TRUE Same results

The review table can be summarised to have a quicker overview.

review_one_var$review_table %>%
  group_by(review_check, review_comment) %>%
  tally()
review_check review_comment n
TRUE Same results 100

To see how differences are shown, some noise is introduced to the dataset.

jittered_df <- binded_df
set.seed(123)
jittered_df[sample(1:nrow(jittered_df), 5), "fsl_fcs_cat.x"] <- sample(unique(jittered_df$fsl_fcs_cat.y), 5, T)
set.seed(124)
jittered_df[sample(1:nrow(jittered_df), 5), "fsl_fcs_cat.y"] <- sample(unique(jittered_df$fsl_fcs_cat.y), 5, T)
set.seed(125)
jittered_df[sample(1:nrow(jittered_df), 5), "fsl_fcs_cat.x"] <- NA
set.seed(1236)
jittered_df[sample(1:nrow(jittered_df), 5), "fsl_fcs_cat.y"] <- NA
set.seed(1237)
jittered_df[sample(1:nrow(jittered_df), 5), "fsl_fcs_score.x"] <- sample(unique(jittered_df$fsl_fcs_score.x), 5, T)
review_one_variable_jittered <- review_variables(jittered_df,
  columns_to_review = "fsl_fcs_cat.x",
  columns_to_compare_with = "fsl_fcs_cat.y")

review_one_variable_jittered$review_table %>%
  group_by(review_check, review_comment) %>%
  tally()
review_check review_comment n
FALSE Different results 9
FALSE Missing in fsl_fcs_cat.x 5
FALSE Missing in fsl_fcs_cat.y 5
TRUE Same results 81

The dataset has new columns to help filtering for further investigation.

review_one_variable_jittered$dataset[, tail(names(review_one_variable_jittered$dataset), 5)] %>%
  head()
hhs_cat fsl_fcs_score.y fsl_fcs_cat.y review_check_fsl_fcs_cat.x review_comment_fsl_fcs_cat.x
No or Little 48.5 Acceptable TRUE Same results
No or Little 50.5 Acceptable TRUE Same results
Moderate 46.5 Acceptable TRUE Same results
No or Little 42.5 Acceptable TRUE Same results
No or Little 55.5 Acceptable TRUE Same results
Moderate 37.5 Poor FALSE Different results
review_one_variable_jittered$dataset %>%
  filter(!review_check_fsl_fcs_cat.x) %>%
  select(uuid, fsl_fcs_cat.x, fsl_fcs_cat.y, review_check_fsl_fcs_cat.x, review_comment_fsl_fcs_cat.x)
uuid fsl_fcs_cat.x fsl_fcs_cat.y review_check_fsl_fcs_cat.x review_comment_fsl_fcs_cat.x
f1b9ec67-20db-47404d-a3ada0-1a37e5c49d02 Acceptable Poor FALSE Different results
42dc8573-e2d0-43484b-aaada2-c37ef865d041 Borderline Acceptable FALSE Different results
fcd69a08-498f-4c4b47-989799-743cbe5fd960 NA Acceptable FALSE Missing in fsl_fcs_cat.x
d36b8cfa-bf52-48434f-8b8d88-ef620d73c941 Poor Acceptable FALSE Different results
72095b68-8c51-484245-929b97-360eda2c4b81 NA Acceptable FALSE Missing in fsl_fcs_cat.x
675eb3d0-62ba-4e4b42-a3a4af-08f14dce6592 Poor Acceptable FALSE Different results
952efab8-4a32-4a4743-bcb4b8-d9a534fb8627 Poor Acceptable FALSE Different results
01648f7a-8521-4d4347-afa3a7-7fd83ea05916 Acceptable Borderline FALSE Different results
10db47e8-a721-49454a-888385-59068d4a3ef2 Poor Borderline FALSE Different results
46c81eb3-7243-414f40-919e98-270439d8fbc5 NA Acceptable FALSE Missing in fsl_fcs_cat.x
765d2341-5df2-43484a-8e888b-362fa085de17 Acceptable NA FALSE Missing in fsl_fcs_cat.y
0dea8527-2ab9-4e4844-88868e-8379e514ca2b NA Acceptable FALSE Missing in fsl_fcs_cat.x
da823c6f-d215-43474c-b4b0b6-6ba519748c0f NA Acceptable FALSE Missing in fsl_fcs_cat.x
e76f4382-b3ea-4b404e-bebfb0-a6b2d7594c18 Acceptable Poor FALSE Different results
42698e10-7e19-4d4744-8d8b8c-04d187cb62ae Borderline NA FALSE Missing in fsl_fcs_cat.y
bdc23a6e-1a35-49474b-949e9b-57bf01284ea9 Acceptable NA FALSE Missing in fsl_fcs_cat.y
2bd1809c-a2c1-424b44-b7b0b4-9a53b24d0e67 Borderline Acceptable FALSE Different results
de416c95-845f-4d4f40-868f8c-e12b3946ad07 Acceptable NA FALSE Missing in fsl_fcs_cat.y
31f4e76d-c64e-4b4144-bbb1bf-b05ca69d823e Acceptable NA FALSE Missing in fsl_fcs_cat.y

If there are more than one variable to review, pair-wise vectors can be used.

my_review <- review_variables(jittered_df,
  columns_to_review = c("fsl_fcs_cat.x", "fsl_fcs_score.x"),
  columns_to_compare_with = c("fsl_fcs_cat.y", "fsl_fcs_score.y")
)
my_review$review_table %>%
  group_by(variable, review_check, review_comment) %>%
  tally()
variable review_check review_comment n
fsl_fcs_cat.x FALSE Different results 9
fsl_fcs_cat.x FALSE Missing in fsl_fcs_cat.x 5
fsl_fcs_cat.x FALSE Missing in fsl_fcs_cat.y 5
fsl_fcs_cat.x TRUE Same results 81
fsl_fcs_score.x FALSE Different results 5
fsl_fcs_score.x TRUE Same results 95
my_review$dataset %>%
  filter(!review_check_fsl_fcs_cat.x) %>%
  select(uuid, fsl_fcs_cat.x, fsl_fcs_cat.y, review_comment_fsl_fcs_cat.x)
uuid fsl_fcs_cat.x fsl_fcs_cat.y review_comment_fsl_fcs_cat.x
f1b9ec67-20db-47404d-a3ada0-1a37e5c49d02 Acceptable Poor Different results
42dc8573-e2d0-43484b-aaada2-c37ef865d041 Borderline Acceptable Different results
fcd69a08-498f-4c4b47-989799-743cbe5fd960 NA Acceptable Missing in fsl_fcs_cat.x
d36b8cfa-bf52-48434f-8b8d88-ef620d73c941 Poor Acceptable Different results
72095b68-8c51-484245-929b97-360eda2c4b81 NA Acceptable Missing in fsl_fcs_cat.x
675eb3d0-62ba-4e4b42-a3a4af-08f14dce6592 Poor Acceptable Different results
952efab8-4a32-4a4743-bcb4b8-d9a534fb8627 Poor Acceptable Different results
01648f7a-8521-4d4347-afa3a7-7fd83ea05916 Acceptable Borderline Different results
10db47e8-a721-49454a-888385-59068d4a3ef2 Poor Borderline Different results
46c81eb3-7243-414f40-919e98-270439d8fbc5 NA Acceptable Missing in fsl_fcs_cat.x
765d2341-5df2-43484a-8e888b-362fa085de17 Acceptable NA Missing in fsl_fcs_cat.y
0dea8527-2ab9-4e4844-88868e-8379e514ca2b NA Acceptable Missing in fsl_fcs_cat.x
da823c6f-d215-43474c-b4b0b6-6ba519748c0f NA Acceptable Missing in fsl_fcs_cat.x
e76f4382-b3ea-4b404e-bebfb0-a6b2d7594c18 Acceptable Poor Different results
42698e10-7e19-4d4744-8d8b8c-04d187cb62ae Borderline NA Missing in fsl_fcs_cat.y
bdc23a6e-1a35-49474b-949e9b-57bf01284ea9 Acceptable NA Missing in fsl_fcs_cat.y
2bd1809c-a2c1-424b44-b7b0b4-9a53b24d0e67 Borderline Acceptable Different results
de416c95-845f-4d4f40-868f8c-e12b3946ad07 Acceptable NA Missing in fsl_fcs_cat.y
31f4e76d-c64e-4b4144-bbb1bf-b05ca69d823e Acceptable NA Missing in fsl_fcs_cat.y
my_review$dataset %>%
  filter(!review_check_fsl_fcs_score.x) %>%
  select(uuid, fsl_fcs_score.x, fsl_fcs_score.y, review_comment_fsl_fcs_score.x)
uuid fsl_fcs_score.x fsl_fcs_score.y review_comment_fsl_fcs_score.x
afd921c6-e54a-4c4740-919c93-87f59bd0e63a 87.0 46.5 Different results
c14529b6-06f7-4d4446-a1a7ac-b3648529ef7d 61.0 78.5 Different results
4a78b1d6-ad91-4b4f45-aba5a0-bed8cf257106 41.5 30.0 Different results
26bce981-8217-464d45-979a96-8a69e5371b02 76.5 47.5 Different results
ad31fe62-5a3c-484d49-b7b8bc-a79d8304be65 37.5 42.5 Different results

Exercises

Exercise 1

  • Add the food consumption matrix score to the dataset. The food consumption matrix score is a food security indicator that uses the food consumption score, household hunger score and the reduced coping strategy index.
name label::english type
rCSILessQlty During the last 7 days, were there days (and, if so, how many) when your household had to rely on less preferred and less expensive food to cope with a lack of food or money to buy it? integer
rCSIBorrow During the last 7 days, were there days (and, if so, how many) when your household had to borrow food or rely on help from a relative or friend to cope with a lack of food or money to buy it? integer
rCSIMealSize During the last 7 days, were there days (and, if so, how many) when your household had to limit portion size of meals at meal times to cope with a lack of food or money to buy it? integer
rCSIMealAdult During the last 7 days, were there days (and, if so, how many) when your household had to restrict consumption by adults in order for small children to eat to cope with a lack of food or money to buy it? integer
rCSIMealNb During the last 7 days, were there days (and, if so, how many) when your household had to reduce number of meals eaten in a day to cope with a lack of food or money to buy it? integer
library(addindicators)
library(dplyr)
exercise_data <- addindicators_MSNA_template_data %>%
  add_fcs(
  cutoffs = "normal",
  fsl_fcs_cereal = "fs_fcs_cereals_grains_roots_tubers",
  fsl_fcs_legumes = "fs_fcs_beans_nuts",
  fsl_fcs_veg = "fs_fcs_vegetables_leaves",
  fsl_fcs_fruit = "fs_fcs_fruit",
  fsl_fcs_meat = "fs_fcs_meat_fish_eggs",
  fsl_fcs_dairy = "fs_fcs_dairy",
  fsl_fcs_sugar = "fs_fcs_sugar",
  fsl_fcs_oil = "fs_fcs_oil_fat_butter"
  ) %>%
  add_hhs(
  )

Did you try the function add_fcm_phase?

The food consumption matrix needs 3 indicators, FCS, rCSI, HHS.

Have you used the correct HHS category variable?

my_answer <- exercise_data %>% add_rcsi(
  ) %>%
  add_fcm_phase(
    fcs_column_name = "fsl_fcs_cat",
    rcsi_column_name = "rcsi_cat",
    hhs_column_name = "hhs_cat_ipc",
    fcs_categories_acceptable = "Acceptable",
    fcs_categories_poor = "Poor",
    fcs_categories_borderline = "Borderline",
    rcsi_categories_low = "No to Low",
    rcsi_categories_medium = "Medium",
    rcsi_categories_high = "High",
    hhs_categories_none = "None",
    hhs_categories_little = "Little",
    hhs_categories_moderate = "Moderate",
    hhs_categories_severe = "Severe",
    hhs_categories_very_severe = "Very Severe"
  )

Exercise 2

  • You receive a dataset, you need to review the following indicators.

    • Food Consumption Score: fcs_score, fcs_cat
    • Household Hunger Score: hhs_score, hhs_cat

Don’t forget to write the review.

dataset_to_review <- read.csv("inputs/06 - exercise - dataset_to_review.csv")

dataset_without_indicators <- addindicators::addindicators_MSNA_template_data

Did you try the function review_variables

How was the FSC created?

How was coded the category for the HHS?

my_review <- dataset_without_indicators %>% 
    add_fcs(
  cutoffs = "normal",
  fsl_fcs_cereal = "fs_fcs_cereals_grains_roots_tubers",
  fsl_fcs_legumes = "fs_fcs_beans_nuts",
  fsl_fcs_veg = "fs_fcs_vegetables_leaves",
  fsl_fcs_fruit = "fs_fcs_fruit",
  fsl_fcs_meat = "fs_fcs_meat_fish_eggs",
  fsl_fcs_dairy = "fs_fcs_dairy",
  fsl_fcs_sugar = "fs_fcs_sugar",
  fsl_fcs_oil = "fs_fcs_oil_fat_butter"
  )  %>% add_hhs(
    hhs_nofoodhh_1 = "fs_hhs_nofood_yn",
    hhs_nofoodhh_1a = "fs_hhs_nofood_freq",
    hhs_sleephungry_2 = "fs_hhs_sleephungry_yn",
    hhs_sleephungry_2a = "fs_hhs_sleephungry_freq",
    hhs_alldaynight_3 = "fs_hhs_daynoteating_yn",
    hhs_alldaynight_3a = "fs_hhs_daynoteating_freq",
    yes_answer = "yes",
    no_answer = "no",
    rarely_answer = "rarely_1_2",
    sometimes_answer = "sometimes_3_10",
    often_answer = "often_10_times"
  ) %>% 
  select(uuid, fsl_fcs_cat, fsl_fcs_score, hhs_cat, hhs_score)
dataset_to_review <- full_join(dataset_to_review, my_review, by = "uuid")

review <- dataset_to_review %>% 
  review_variables(columns_to_review = c("fsl_fcs_cat.x", "fsl_fcs_score.x", "hhs_cat.x", "hhs_score.x"),
                   columns_to_compare_with = c("fsl_fcs_cat.y", "fsl_fcs_score.y", "hhs_cat.y", "hhs_score.y"))

review$review_table %>% 
  group_by(variable,review_check,review_comment) %>% 
  tally()
  • There are 10 fcs categories that are different.
  • There are 100 HHS categories that are different
review$dataset %>% 
  filter(!review_check_fsl_fcs_cat.x) %>% 
  select(uuid, review_comment_fsl_fcs_cat.x, fsl_fcs_score.x, fsl_fcs_cat.x, fsl_fcs_cat.y)
uuid review_comment_fsl_fcs_cat.x fsl_fcs_score.x fsl_fcs_cat.x fsl_fcs_cat.y
f1b9ec67-20db-47404d-a3ada0-1a37e5c49d02 Different results 37.5 Borderline Acceptable
e21a34f5-1a46-42404b-b7b6be-7bc9286d0f13 Different results 36.0 Borderline Acceptable
42dc8573-e2d0-43484b-aaada2-c37ef865d041 Different results 39.0 Borderline Acceptable
fcd69a08-498f-4c4b47-989799-743cbe5fd960 Different results 36.5 Borderline Acceptable
4d1cae02-49e0-484c4e-8f8d8c-a97dec246310 Different results 41.0 Borderline Acceptable
a8319ceb-857c-434142-b9b8bc-7905684fc1d3 Different results 37.5 Borderline Acceptable
6d1acb45-cfb6-4b4441-87888f-e2a4756d30f9 Different results 38.0 Borderline Acceptable
0dea8527-2ab9-4e4844-88868e-8379e514ca2b Different results 40.0 Borderline Acceptable
7e94afc5-af0b-4a4c46-bebcb7-9f60d1e53a47 Different results 37.0 Borderline Acceptable
b719ef08-bdf5-474240-858d8a-a12bc65d349e Different results 41.5 Borderline Acceptable
  • Food Consumption Score have different categories, what threshold were used to compute the FCS? Maybe 28-42?
review$dataset %>% 
  filter(!review_check_hhs_cat.x) %>% 
  select(hhs_cat.x, hhs_cat.y) %>% 
  table(useNA = "ifany")
              hhs_cat.y
hhs_cat.x      Moderate No or Little Severe
  moderate           29            0      0
  no_or_little        0           58      0
  severe              0            0     13
  • HHS is fine. Labeling is different

Analysis - Introduction

The framework is built around 4 steps: cleaning, composition, analysis, outputs

  • Cleaning: any manipulation to go from the raw data to the clean data
  • Composition: any manipulation before the analysis e.g. adding indicators, adding information from loop or main, aok aggregation, etc.
  • Analysis: any manipulation regarding only the analysis
  • Outputs: any manipulation to format the outputs.

The following section will present some introduction about the analysis.

The third step of the framework is the analysis. The analysis step aims to create a long table with one result per line and an analysis key. That table is not made for a human to read it but to store some information. Analysis stops at the results table: long format, stat + analysis key

The analysis key format is currently :

  • analysis type @/@ analysis variable %/% analysis variable value @/@ grouping variable %/% grouping variable value

  • analysis type @/@ dependent variable %/% dependent variable value @/@ independent variable %/% independent variable value

If there are two or more grouping variables it would look like that

  • analysis type @/@ analysis variable %/% analysis variable value @/@ grouping variable 1 %/% grouping variable value 1 -/- grouping variable 2 %/% grouping variable value 2

Same would apply for analysis variable in case of a ratio.

The current analysis types are :

  • mean
  • median
  • prop_select_one: proportion for select one
  • prop_select_multiple: proportion for select multiple
  • ratio

create_analysis

Any create_analysis_* function will need a survey to be used, not a dataset. A survey object will be defined with the weights, strata and cluster information if they exists.

create_*

create_* functions will create, transform something, e.g. creating a cleaning log with the checks to be filled, create analysis results table, create an output.

Outputs from create_* functions outputs can be in different shape, format, etc.

create_* function is catch-all.

library(analysistools)
only_nas <- my_data_with_indicators %>%
  summarise(across(.cols = everything(), .fns = function(x) {
    sum(is.na(x)) == nrow(my_data_with_indicators)
  })) %>%
  do.call(c, .)

my_data_shorter <- my_data_with_indicators[, !only_nas] %>%
  select(!grep("other", names(my_data_with_indicators), value = T))
Note

At the moment, create_analysis breaks where a column only have missing values. They need to be removed beforehand.

my_design <- srvyr::as_survey_design(my_data_shorter, strata = "admin1")

my_analysis <- create_analysis(my_design, sm_separator = "/")
Note

create_analysis uses a survey design, not a dataset. Survey design object contains the information on the design of the survey, such as stratas, cluster, weights. It is built with srvyr package which is a wrapper around survey package.

create_analysis returns a list with:

  • Results table: a long table with summary statistics per line
  • Dataset: the dataset used with the survey design
  • List of analysis: all the analysis that were performed
my_analysis %>%
  names()
[1] "results_table" "dataset"       "loa"          
my_analysis$results_table %>%
  head()
analysis_type analysis_var analysis_var_value group_var group_var_value stat stat_low stat_upp n n_total n_w n_w_total analysis_key
prop_select_one enum_gender female NA NA 0.33 0.2356238 0.4243762 33 100 33 100 prop_select_one @/@ enum_gender %/% female @/@ NA %/% NA
prop_select_one enum_gender male NA NA 0.29 0.1989592 0.3810408 29 100 29 100 prop_select_one @/@ enum_gender %/% male @/@ NA %/% NA
prop_select_one enum_gender other NA NA 0.38 0.2829941 0.4770059 38 100 38 100 prop_select_one @/@ enum_gender %/% other @/@ NA %/% NA
prop_select_one hoh no NA NA 0.50 0.3995960 0.6004040 50 100 50 100 prop_select_one @/@ hoh %/% no @/@ NA %/% NA
prop_select_one hoh yes NA NA 0.50 0.3995960 0.6004040 50 100 50 100 prop_select_one @/@ hoh %/% yes @/@ NA %/% NA
prop_select_one respondent_able_to_answer no NA NA 0.57 0.4736852 0.6663148 57 100 57 100 prop_select_one @/@ respondent_able_to_answer %/% no @/@ NA %/% NA
my_analysis$loa %>%
  head()
analysis_type analysis_var group_var level
prop_select_one enum_gender NA 0.95
prop_select_one hoh NA 0.95
prop_select_one respondent_able_to_answer NA 0.95
mean respondent_age NA 0.95
median respondent_age NA 0.95
prop_select_one respondent_gender NA 0.95

Exercises

Exercise 1

library(analysistools)
exercise_data <- analysistools_MSNA_template_data

only_nas <- exercise_data %>%
  summarise(across(.cols = everything(), .fns = function(x) {
    sum(is.na(x)) == nrow(exercise_data)
  })) %>%
  do.call(c, .)

exercise_data_shorter <- exercise_data[, !only_nas] %>%
  select(!grep("other", names(exercise_data), value = T))

With the dataset exercise_data, please do the following:

  • Create a results table at the level respondent_gender. Keep the strata at admin1

Did you try the argument group_var in create_analysis ?

my_exercise_design <- srvyr::as_survey_design(exercise_data_shorter, strata = "admin1") 

my_answer_analysis <- create_analysis(my_exercise_design, group_var = "respondent_gender", sm_separator = "/")

Exercise 2

  • The analysis should be weighted, this is the sampling frame. Re-do the analysis at the overall level.
sampling_frame <- data.frame(
  strata = c("admin1a", "admin1b", "admin1c"),
  population = c(100000, 200000, 300000)
)
sampling_frame
strata population
admin1a 1e+05
admin1b 2e+05
admin1c 3e+05

Did you try the function add_weights ?

Did you modify your design object?

exercise_data_shorter_weigthed <- exercise_data_shorter %>% 
  add_weights(sample_data = sampling_frame, strata_column_dataset = "admin1", strata_column_sample = "strata", population_column = "population")

my_exercise_design_weigthed <- srvyr::as_survey_design(exercise_data_shorter_weigthed, strata = "admin1", weights = "weights") 

my_answer_analysis_weighted <- create_analysis(my_exercise_design_weigthed, sm_separator = "/")